mean field multi-agent imitation learning
Bayesian Multi-type Mean Field Multi-agent Imitation Learning
Multi-agent Imitation learning (MAIL) refers to the problem that agents learn to perform a task interactively in a multi-agent system through observing and mimicking expert demonstrations, without any knowledge of a reward function from the environment. MAIL has received a lot of attention due to promising results achieved on synthesized tasks, with the potential to be applied to complex real-world multi-agent tasks. Key challenges for MAIL include sample efficiency and scalability. In this paper, we proposed Bayesian multi-type mean field multi-agent imitation learning (BM3IL). Our method improves sample efficiency through establishing a Bayesian formulation for MAIL, and enhances scalability through introducing a new multi-type mean field approximation. We demonstrate the performance of our algorithm through benchmarking with three state-of-the-art multi-agent imitation learning algorithms on several tasks, including solving a multi-agent traffic optimization problem in a real-world transportation network. Experimental results indicate that our algorithm significantly outperforms all other algorithms in all scenarios.
Review for NeurIPS paper: Bayesian Multi-type Mean Field Multi-agent Imitation Learning
Clarity: The paper is generally well-written, though suffers from a lack of clarity in some important sections: 4. [Equation 1] ] I believe the inner log in the right hand term of Equation (1) should not be present. I assumed it was a typo, but it is present throughout the text, even for the authors' proposed approach (e.g., in Equation 3). If intentional, why is this necessary? The paper introduces the problem scenario as a Markov game in Section 2.1; however, it introduces the notion of binary observations (which are a function of rewards here) in Section 3.1.1 This seems to suggest that perhaps the problem formulation should be corrected to a Partially Observable Markov game (POSG).
Review for NeurIPS paper: Bayesian Multi-type Mean Field Multi-agent Imitation Learning
All reviewers agree this paper is a clear accept. The most critical reviewer was satisfied by the authors' rebuttal addressing his major concerns as the authors have run new experiments on a new domain, conducted a more thorough analysis of the attention mechanism in their previous experiments, and fixed some noted mistakes in their equations.
Bayesian Multi-type Mean Field Multi-agent Imitation Learning
Multi-agent Imitation learning (MAIL) refers to the problem that agents learn to perform a task interactively in a multi-agent system through observing and mimicking expert demonstrations, without any knowledge of a reward function from the environment. MAIL has received a lot of attention due to promising results achieved on synthesized tasks, with the potential to be applied to complex real-world multi-agent tasks. Key challenges for MAIL include sample efficiency and scalability. In this paper, we proposed Bayesian multi-type mean field multi-agent imitation learning (BM3IL). Our method improves sample efficiency through establishing a Bayesian formulation for MAIL, and enhances scalability through introducing a new multi-type mean field approximation.